-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Implemented MultiIndex.searchsorted method ( GH14833) #61435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thanks @GSAUC3 for the PR. Is there and issue, or has there been any discussion about this elsewhere? |
Hi @datapythonista, this was the issue #14833 , against which i made the pull request. |
HI, I had ran pytest and pre-commit locally, is it possible to run all these test locally? |
@GSAUC3 Add a return statement in your function after
|
hi, @datapythonista , I am having trouble running all these tests, locally, before committing my code, so far i have only ran the pytest, locally, and that worked, could you please, guide me, how to set up the testing environment locally, before each commit? will running |
pre-commit should run automatically if it's set up to work as intended. You have all the information on how to set up the development environment, run tests... in the development documentation: https://pandas.pydata.org/docs/development/index.html |
Hi, @datapythonista, this part of the error messages tells, us that searchsorted method should fail, but it is passing, am i correct?
|
Yes, that's correct. I guess we have an xfail for the test that should be removed. |
Hi @datapythonista . Thank you for your suggestions, I've addressed the feedback from earlier and the CI checks are now passing. This PR should be ready for review whenever you get a chance. Please let me know if any changes are required. Thanks again! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, added few comments.
You will have to add a not in the whatsnew for 3.0. You can check other PRs to see how we do it.
pandas/tests/base/test_misc.py
Outdated
# if isinstance(obj, pd.MultiIndex): | ||
# # See gh-14833 | ||
# request.applymarker( | ||
# pytest.mark.xfail( | ||
# reason="np.searchsorted doesn't work on pd.MultiIndex: GH 14833" | ||
# ) | ||
# ) | ||
if obj.dtype.kind == "c" and isinstance(obj, Index): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this instead of commenting please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely, I will do it right away.
.gitignore
Outdated
@@ -141,3 +141,5 @@ doc/source/savefig/ | |||
# Pyodide/WASM related files # | |||
############################## | |||
/.pyodide-xbuildenv-* | |||
|
|||
*.ipynb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to remove this. We do have some notebooks in this repo iirc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will remove it.
@@ -1029,3 +1029,28 @@ def test_get_loc_namedtuple_behaves_like_tuple(): | |||
assert idx.get_loc(("i1", "i2")) == 0 | |||
assert idx.get_loc(("i3", "i4")) == 1 | |||
assert idx.get_loc(("i5", "i6")) == 2 | |||
|
|||
|
|||
def test_searchsorted(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to divide this test. Ideally, when a test fails, we want to know what's wrong just by checking the test name, and also we want to still test everything else. With a single test, the first thing that fails will make the whole test fail. You can use pytest fixtures and parametrize to not repeat too much code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion, it is quite insightful for me. I have created three different functions as follows:
def test_searchsorted_single():... # for single inputs
def test_searchsorted_lists():... # list of single inputs
def test_searchsorted_invalid():... # for invalid inputs
Would this be the correct way to do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that may seem silly, but then when a test fails it's very easy to know what it fails. Ideally an assert per test, and if more than one input is asserted, then pytest.mark.parametrize to reuse a test with multiple inputs.
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.